Bellman goes Relational ( extended abstract ) 1
نویسندگان
چکیده
We introduce ReBel, a relational Bellman update operator that can be used for Markov Decision Processes in – possibly infinite – relational domains. Using ReBel we develop a relational value iteration algorithm. 1 Relational Markov Decision Processes Many reinforcement learning (RL) and dynamic programming techniques have been developed for solving Markov Decision Processes (MDP). Until recently, the representation of MDPs was limited to states and actions that were discrete symbols or factored into propositions or attribute-value pairs. Recent approaches have shown that this representation can be upgraded towards relational (or first-order) representations – thereby enabling the use of first-order logical languages to represent states and actions in terms of objects and relations such as inRoom(robot, kitchen) and driveTo(library). Although some approaches have appeared for solving relational MDPs (RMDP) (e.g. relational RL [2]), not much work has considered exact, model-based solution techniques. The only example is the work by Boutilier et al. (2001) which employs situation calculus for modeling an RMDP which is then solved by a value iteration algorithm. However, because of the complexity of the language, the algorithm was not fully automated yet, e.g. the simplification of expressions obtained is done manually. Here we show that by using a restricted language, the simplification is computationally feasible and we develop a fully automatic value iteration algorithm. Another aim of the paper is to give some insights into the framework of relational RL. 2 Relational Bellman Operator The first step is the introduction of a logical formalism to specify MDPs over relational domains. A constraint logic programming language is used to define four The full paper appeared in the Proceedings of the 21st International Conference on Machine Learning (ICML’04 ) [July 4–8, 2004, Banff, Canada]. ingredients: (1) abstract states are conjunctions of logical atoms aggregating sets of concrete states. (2) abstract actions are similar to first-order, probabilistic STRIPS operators. (3) abstract rewards form a reward model describing state-based rewards. (4) integrity constraints describe domain constraints. Based on this formalism a relational upgrade of the Bellman backup operator can be defined. Starting from an initial state value function, each value backup step consists of the following operations: (1) For each action rule we employ regression to compute weakest preconditions (wp) of abstract states and abstract action rules. (2) For each of these wps derived by some action rule, we compute a Qvalue, defining a Q-rule. (3) Each Q-rule defines a partial Q-value, because it is based on one outcome of the action. All the Q-rules for different outcomes of the same action are combined into a complete Q-rule. (4) Because V (s) = maxa Q(s, a) we maximize over the set of Q-rules to get an equivalent state-value function. Overall, these four steps refine the current value partition V t into V t+1 Relational value iteration can now be implemented by repeated application of the described operations. Starting from the state-value function defining goal state rewards (V ) we compute the series V 1 . . . V n until some stopping criterium.
منابع مشابه
Relational Structures for Hierachical Abstraction { Applied to the Analysis of Printed Forms
Here goes the abstract 1 Introduction
متن کاملAgent Learning in Relational Domains based on Logical MDPs with Negation
In this paper, we propose a model named Logical Markov Decision Processes with Negation for Relational Reinforcement Learning for applying Reinforcement Learning algorithms on the relational domains with the states and actions in relational form. In the model, the logical negation is represented explicitly, so that the abstract state space can be constructed from the goal state(s) of a given ta...
متن کاملUpdate Control in Deductive Object Bases1
Deductive object bases are a combination of both deductive and objectoriented databases. This dissertation investigates the problem of update control for such databases. The data model is defined as an extreme case of a deductive database with only one extensional predicate and a multitude of axioms defining the object-oriented abstraction concepts. The advantages are twofold: a distinct concep...
متن کاملExtended Applicability of the Symplectic Pontryagin Method
Abstract. The Symplectic Pontryagin method was introduced in a previous paper. This work shows that this method is applicable under less restrictive assumptions. Existence of solutions to the Symplectic Pontryagin scheme are shown to exist without the previous assumption on a bounded gradient of the discrete dual variable. The convergence proof uses the representation of solutions to a Hamilton...
متن کاملAn Extended Relational Algebra on Abstract Objects for Summarizing Answers to Queries
Answers to queries in terms of abstract objects are defined in the logical framework of first order predicate calculus. A partial algebraic characterisation of the supremum and of the infimum of abstract answers is given in an extended Relational Algebra of the Cylindric Algebra kind. Then, the form of queries is restricted in order to be able to compute answers without the cylindrification ope...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008